Delay Scheduling Based Replication Scheme for Hadoop Distributed File System

نویسنده

  • S. Suresh
چکیده

The data generated and processed by modern computing systems burgeon rapidly. MapReduce is an important programming model for large scale data intensive applications. Hadoop is a popular open source implementation of MapReduce and Google File System (GFS). The scalability and fault-tolerance feature of Hadoop makes it as a standard for BigData processing. Hadoop uses Hadoop Distributed File System (HDFS) for storing data. Data reliability and faulttolerance is achieved through replication in HDFS. In this paper, a new technique called Delay Scheduling Based Replication Algorithm (DSBRA) is proposed to identify and replicate (dereplicate) the popular (unpopular) files/blocks in HDFS based on the information collected from the scheduler. Experimental results show that, the proposed method achieves 13% and 7% improvements in response time and locality over existing algorithms respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Data Replication Scheme based on Hadoop Distributed File System

Hadoop distributed file system (HDFS) is designed to store huge data set reliably, has been widely used for processing massive-scale data in parallel. In HDFS, the data locality problem is one of critical problem that causes the performance decrement of a file system. To solve the data locality problem, we propose an efficient data replication scheme based on access count prediction in a Hadoop...

متن کامل

Adaptive Data Replication Scheme Based on Access Count Prediction in Hadoop

Hadoop, an open source implementation of the MapReduce framework, has been widely used for processing massive-scale data in parallel. Since Hadoop uses a distributed file system, called HDFS, the data locality problem often happens (i.e., a data block should be copied to the processing node when a processing node does not possess the data block in its local storage), and this problem leads to t...

متن کامل

A Comparative Analysis of MapReduce Scheduling Algorithms for Hadoop

Today’s Digital era causes escalation of datasets. These datasets are termed as “Big Data” due to its massive amount of volume, variety and velocity and is stored in distributed file system architecture. Hadoop is framework that supports Hadoop Distributed File System (HDFS)for storing and MapReduce for processing of large data sets in a distributed computing environment. Task assignment is pos...

متن کامل

Performance Evaluation of Stream Log Collection Using HADOOP Distributed File System

Recently stream logging has been referred to widely by web based and product based companies. Stream logging is one of the most important topic of agenda in business re-engineering. Business re-engineering is done in order to improve the effectiveness and productiveness of a particular product or service. Stream logging is achieved with minimum cost using transaction based model over a distribu...

متن کامل

An Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management

Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large datasets across cluster of workstations. To handle massive scale data, Hadoop exploits the Hadoop Distributed File System termed as HDFS. The HDFS similar to most distributed file systems sh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015